y
g
yp
j
a specific protein function from a constructed machine learning
Schietgat, et al., 2010]. A linear model can provide a good
ation function, but is unable to model complex data. A nonlinear
m such as MLP or RBFNN or SVM can model complex data, but
power of explaining what has been done in a model.
handling is also a key issue of the concern when modelling a data
types can be very different from applications to applications. For
they might be categorical data or non-numerical data. The
cleavage pattern discovery problem always deals with non-
l data, i.e., the amino acids. Without an encoding process, the
tioned machine learning algorithms can do nothing for
ase cleavage pattern discovery problem. This thus challenges the
tioned machine learning algorithms.
over, the problem of model construction complexity has been of
concern in some of the aforementioned machine learning
ms. This is because they require the lengthy model construction
a long process of the generalisation test to overcome the model
ty problem, such as the MLP algorithms.
y, dimensionality is also a concern when employing the
tioned machine learning algorithms. Some of these algorithms
le to handle data in which the dimension is greater than the
of samples. This is because the statistical significance cannot be
ed during a learning process.
he working principle of inductive learning
ctive learning approaches have been well recognised to have a
feature to overcome these limitations which occur to the
tioned machine learning algorithms. The most commonly
d inductive learning approaches include the decision tree
m (DT) [Quinlan, 1986] and the classification and regression tree
m (CART) [Breiman, et al., 1984]. These algorithms have also
ployed in many biological/medical pattern analysis projects.
asic principle of DT and CART is “divide and conquer”, which
d concept exercised in the computer sciences since 1970s